A AfterWork 


Optimizing Neural Networks 


Learning Outcomes 


By the end of this topic, you will have achieved the following learning outcomes: 


e Perform feature engineering techniques to improve a neural network’s performance. 
e Perform hyperparameter techniques to improve a neural network’s performance. 


"Al is only as good as the data that we can feed it." Paul Hofmann 


Reading 


Overview 


In many cases, neural networks perform better than most basic machine learning 
algorithms (i.e. Linear Regression, K-Nearest Neighbours, Naive Bayes etc.) without 
further optimizations. However, we might still need to optimize our model depending on 
whether our metric of success has been achieved. 


Some of the areas in industry where optimizing neural networks would be important 
would be in manufacturing where attention to detail is crucial and in healthcare where 
higher performance would lead to better patient outcomes. 


There are many ways of optimizing neural networks. These methods that we can use to 
optimize neural networks can be categorized as follows: 

e Improving Performance With Data 

e Improving Performance With Hyperparameter Tuning 

e Improving Performance With Ensembles 


Improving Performance With Data 


The techniques under this categorization involve working with provided data in order to 
improve the deep learning model's performance. Such techniques would include: 
1. More Data 
o Deep learning models work better with more data. Hence, when we are 
able to get more training data, then our models tend to learn more from 
our data and in turn, improving its performance. In this case, getting more 
data would mean having an increase in the number of records within our 
training dataset. 
2. Normalizing/Scaling the Data 
o Rescaling our data would potentially increase the performance of our 
dataset. This would mean scaling the data to the bounds of our activation 
functions. We can create different versions of our training dataset which 
have been scaled the evaluate the performance of our model. Rescaling 
would happen for all our variables predictor and label variables: 

i. If we are using sigmoid activation functions, we should scale your 
data to values between 0-and-1. 

ii. If we are using the Hyperbolic Tangent (tanh), we should rescale 
to values between -1 and 1. 

iii. We can also perform standardization which would put different 
variables on the same scale. 
3. Data Transformation 
o We can also transform our data in an effort to improve the performance of 
our neural network by applying some of the following data transformation 
techniques: 

i. Ensuring features comprise of a gaussian distribution by adjusting 
any skew with a Box-Cox transform or a log transform for the 
exponential distribution. 

ii. | Pre-processing the data with PCA. 

iii. | Aggregating multiple features in common variables. 
4. Feature Selection 
o While neural networks can automatically select the most effective features 
for model creation, one can use domain knowledge to select features only 
related to the given research problem. This would help in determining an 
outcome that is desired with the appropriate data without compromising 
on the integrity of the used data. 


Improving Performance With Hyperparameter Tuning 


1. Model Diagnostics 
e The performance of a model might no longer be improving because it 
could be overfitting or underfitting in varying degrees. One could 


determine this by evaluating the training and validation dataset on each 
epoch. This can be achieved by creating a plot of epoch vs accuracy. 

o If the training dataset has an accuracy better than the validation 
/test dataset, then one should ensure that they use regularization 
techniques to ensure that the model is not overfitting. 

o If training and validation/test accuracies are both low, then our 
model could be underfitting. Hence, one should consider 
increasing the capacity of the network by adding more 
layers/neurons or even using more training data. 

The diagram below provides us with an understanding of such a plot. 
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Learning Rate 
e We can try different learning rates for our model by doing the following: 
o Experimenting with very large and very small learning rates. 
o Grid searching learning rate values and see how far you can push 
the network. 
e While changing the learning rate, we should note that larger networks 
need more training, hence we should define what would be the balance 
between the two. 


. Activation Functions 


e We need to modify our activation functions so as to ensure that we 
selected the correct activations functions for the problem at hand. For 
example, switching our sigmoid activation function for binary classification 
to linear activation function for a regression problem. 

Network Topology 

e Changing the structure of our neural network may yield the desired 

improvement. This change of structure might involve changing the no. of 


layers or changing the no. of neurons. Below are some of the possible 
changes: 
o Trying one hidden layer with a lot of neurons. 
o Trying a deep network with few neurons per layer. 
o Trying combinations of the above. 
o Trying topology patterns from literature i.e. papers solving similar 
problems. 
5. Batches and Epochs 

e The no. of batches and epochs can also be modified to further improve 
the performance of our model. Small batch sizes with large epoch size 
and a large number of training epochs are common in modern deep 
learning implementations. 

e A few of such approaches, in this case, might include: 

o Trying a batch size equal to training data size. 

o Trying a grid search of different mini-batch sizes. 

o Trying a few epochs and later a large number of epochs. 
6. Regularization 

e Regularization helps us to curb model overfitting during training. This is 
commonly done through the use of the dropout technique which randomly 
skips neurons during training and forces others in the layer to address the 
shortcomings. More specifically, through the use of neurons, we perform 
an implementation by only keeping some neurons active with some 
probability p or setting it to 0. This forces the network to not learn 
redundant information. 

e There are other traditional neural network regularization techniques such 
as weight decay to penalize large weights and activation constraint that 
could be used to penalize large activations. 

7. Optimisation and Loss 

e We can explore optimization algorithms such as adam and rmsprop as 

optimisation and loss functions. 


Improving Performance With Ensembles 


e Combining Models 
o Multiple deep learning algorithms that perform well on a particular dataset 
but use very different network topologies/techniques can be combined by 
taking the mean of their accuracy as the final output. This approach could 
help address the shortcomings of one model over the other. 


The above methods are basic techniques that we could apply in an effort to improve our 
neural networks. Before the above methods, we should also consider whether redefining 
our problem would also be an option. 

1. Questions that we could ask while doing this would be: 

2. Can our classification problem better be framed to be a regression problem? 

Vice-versa. 
3. Can our binary output in our classification problem become a softmax output? 
4. Can our model be a sub-problem instead? 


This would be an important step in ensuring that our solution is satisfactory. 
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